Turbo Decoders for Audio-Visual Continuous Speech Recognition

نویسنده

Ahmed Hussen Abdelaziz

چکیده

Visual speech, i.e., video recordings of speakers’ mouths, plays an important role in improving the robustness properties of automatic speech recognition (ASR) against noise. Optimal fusion of audio and video modalities is still one of the major challenges that attracts significant interest in the realm of audiovisual ASR. Recently, turbo decoders (TDs) have been successful in addressing the audio-visual fusion problem. The idea of the TD framework is to iteratively exchange some kind of soft information between the audio and video decoders until convergence. The forward-backward algorithm (FBA) is mostly applied to the decoding graphs to estimate this soft information. Applying the FBA to the complex graphs that are usually used in large vocabulary tasks may be computationally expensive. In this paper, I propose to apply the forward-backward algorithm to a lattice of most likely state sequences instead of using the entire decoding graph. Using lattices allows for TD to be easily applied to large vocabulary tasks. The proposed approach is evaluated using the newly released TCD-TIMIT corpus, where a standard recipe for large vocabulary ASR is employed. The modified TD performs significantly better than the feature and decision fusion models in all clean and noisy test conditions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Turbo-Decoding Weighted Forward-Backward Algorithm for Multimodal Speech Recognition

Since the performance of automatic speech recognition (ASR) still degrades under adverse acoustic conditions, recognition robustness can be improved by incorporating further modalities. The arising question of information fusion shows interesting parallels to problems in digital communications, where the turbo principle revolutionized reliable communication. In this paper, we examine whether th...

متن کامل

Audio - Visual Continuous Speech Recogni Markov Mode

With the increase in the computational complexity of recent computers, audio-visual speech recognition (AVSR) became an attractive research topic that can lead to a robust solution for speech recognition in noisy environments. In the audio visual continuous speech recognition system presented in this paper, the audio and visual observation sequences are integrated using a coupled hidden Markov ...

متن کامل

Introducing the Turbo-Twin-HMM for Audio-Visual Speech Enhancement

Models for automatic speech recognition (ASR) hold detailed information about spectral and spectro-temporal characteristics of clean speech signals. Using these models for speech enhancement is desirable and has been the target of past research efforts. In such model-based speech enhancement systems, a powerful ASR is imperative. To increase the recognition rates especially in low-SNR condition...

متن کامل

Design and recording of Czech speech corpus for audio-visual continuous speech recognition

In this paper we describe the design, recording, and content of a large audio-visual speech database intended for training and testing of audio-visual continuous speech recognition systems. The UWB05-HSCAVC database contains high resolution video and quality audio data suitable for experiments on audio-visual speech recognition. The corpus consists of nearly 40 hours of audiovisual records of 1...

متن کامل

Continuous Audio-visual Speech Recognition Continuous Audio-visual Speech Recognition

We address the problem of robust lip tracking, visual speech feature extraction, and sensor integration for audiovisual speech recognition applications. An appearance based model of the articulators, which represents linguistically important features, is learned from example images and is used to locate, track, and recover visual speech information. We tackle the problem of joint temporal model...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Turbo Decoders for Audio-Visual Continuous Speech Recognition

نویسنده

چکیده

منابع مشابه

A Turbo-Decoding Weighted Forward-Backward Algorithm for Multimodal Speech Recognition

Audio - Visual Continuous Speech Recogni Markov Mode

Introducing the Turbo-Twin-HMM for Audio-Visual Speech Enhancement

Design and recording of Czech speech corpus for audio-visual continuous speech recognition

Continuous Audio-visual Speech Recognition Continuous Audio-visual Speech Recognition

عنوان ژورنال:

اشتراک گذاری